Search CORE

25 research outputs found

Un enfoque integrado para la desambiguación

Author: Atserias Batalla Jordi
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2005
Field of study

En este artículo presentamos una extensión de una arquitectura integrada diseñada originalmente para Semantic Parsing a WSD. El marco propuesto permitirá que ambas tareas puedan colaborar y llevarse a cabo simultáneamente. Se ha probado la validez y robustez de esta arquitectura contra una tarea de WSD bien definida (el SENSEVAL-II English Lexical Sample) aplicando modelos sintáctico-semánticos adquiridos automáticamente.This paper presents an extension for WSD of an integrated arquitecture designed for Semantic Parsing. In the proposed framework, both tasks could be addressed simultaneously, collaborating between them. The feasibility and robustness of the proposed arquitecture have been proved against a well-defined task on WSD (the SENSEVAL-II English Lexical Sample) using automatically acquired models

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Secretaría de Estado de Cultura

Syntactic parsing of unrestricted Spanish text

Author: Atserias Batalla Jordi
Castellón Masalles Irene
Civit Montse
Publication venue
Publication date: 01/01/1998
Field of study

This research focusses on the syntactical parsing of morphologycal tagged corpora. A proposal for a corpus oriented Spanish grammar is presented in this document. This work has been developed in the framework of the ITEM project and its main goal is to provide multilingual background for information extraction and retrieval tasks. The main goal of Tacat analyser is to provide a way of obtaining large amounts of bracketed and parsed corpora, both general land specific domain. Tacat uses context free grammars and has as input following categories of Parole specification.The incremental methodology that we use allows us to recognise different levels of complexity in the analysis and to produce compatible outputs of all the grammars.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Través de los Ojos de VERTa

Author: Atserias Batalla Jordi
Comelles Pujadas Elisabet
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2016
Field of study

This paper describes a practical demo of VERTa for Spanish. VERTa is an MT evaluation metric that combines linguistic features at different levels. VERTa has been developed for English and Spanish but can be easily adapted to other languages. VERTa can be used to evaluate adequacy, fluency and ranking of sentences. In this paper, VERTa’s modules are described briefly, as well as its graphical interface which provides information on VERTa’s performance and possible MT errors.Este artículo describe la demostración práctica de VERTa para el castellano. VERTa es una métrica de evaluación de traducción automática que combina información lingüística a diferentes niveles. VERTa ha sido desarrollada para el inglés y el castellano pero se puede adaptar fácilmente a otras lenguas. La métrica puede evaluar la adecuación, la fluidez y ranking de frases. En este artículo se describen brevemente los módulos de VERTa y su interficie gráfica, la cual proporciona información sobre el rendimiento de la métrica y posibles errores de traducción.This work has been funded by the Spanish Government (project TUNER, TIN2015-65308-C5-1-R)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A proposal for a shallow ontologization of WordNet

Author: Atserias Batalla Jordi
Climent Roca Salvador
Moré López Joaquim
Publication venue: Procesamiento del lenguaje natural
Publication date: 01/09/2005
Field of study

En este artículo se presenta el trabajo que se está realizando para la llamada ontologización superficial de WordNet, una estructura orientada a superar muchos de los problemas estructurales de la popular base de conocimiento léxico. El resultado esperado es un recurso multilingüe más apropiado que los ahora existentes para el procesamiento semántico a gran escala.This paper presents the work carried out towards the so-called shallow ontologization of WordNet, which is argued to be a way to overcome most of the many structural problems of the widely used lexical knowledge base. The result shall be a multilingual resource more suitable for large-scale semantic processing

The Oberta in open access

The MEANING Project

Author: Agirre Bengoa Eneko
Atserias Batalla Jordi
Rigau Claramunt German
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2003
Field of study

A pesar del progreso que se realiza en el Procesamiento del Lenguaje Natural (PLN) aún estamos lejos de la Comprensión del Lenguaje Natural. Un paso importante hacia este objetivo es el desarrollo de técnicas y recursos que traten conceptos en lugar de palabras. Sin embargo, si queremos construir la próxima generación de sistemas inteligentes que traten Tecnología de Lenguaje Humano en dominios abiertos necesitamos resolver dos tareas intermedias y complementarias: resolución de la ambigüedad léxica de las palabras y enriquecimiento automático y a gran escala de bases de conocimiento léxico.Progress is being made in Natural Language Processing (NLP) but there is still a long way towards Natural Language Understanding. An important step towards this goal is the development of technologies and resources that deal with concepts rather than words. However, to be able to build the next generation of intelligent open domain Human Language Technology (HLT) application systems we need to solve two complementary and intermediate tasks: Word Sense Disambiguation (WSD) and automatic large-scale enrichment of Lexical Knowledge Bases.The MEANING Project is funded by the EU 5th Framework IST Programme

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Spell-checking in Spanish: the case of diacritic accents

Author: Atserias Batalla Jordi
Fuentes Fort Maria
Nazar Rogelio
Renau Irene
Publication venue
Publication date: 01/01/2012
Field of study

This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an orthographically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solution to the problem in those cases where both forms of a word are listed in the checker’s dictionary. This is the case, for instance, when a word form exists with and without diacritics, such as continuo ‘continuous’ and continuó ‘he/she/it continued’, or when different diacritics make other word distinctions, as in continúo ‘I continue’. We propose a very simple solution based on a word bigram model derived from correctly typed Spanish texts and evaluate the ability of this model to restore diacritics in artificial as well as real errors. The case of diacritics is only meant to be an example of the possible applications for this idea, yet we believe that the same method could be applied to other kinds of orthographic or even grammatical errors. Moreover, given that no explicit linguistic knowledge is required, the proposed model can be used with other languages provided that a large normative corpus is available.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Automatic acquisition of sense examples using Exretriever

Author: Atserias Batalla Jordi
Castillo Valdés Mauro
Fernández Juan
Rigau Claramunt German
Tormo Jordi
Publication venue
Publication date: 01/01/2004
Field of study

A current research line for word sense disambiguation (WSD) focuses on the use of supervised machine learning techniques. One of the drawbacks of using such techniques is that previously sense annotated data is required. This paper presents ExRetriever, a new software tool for automatically acquiring large sets of sense tagged examples from large collections of text and the Web. ExRetriever exploits the knowledge contained in large-scale knowledge bases (e.g., WordNet) to build complex queries, each of them characterising particular senses of a word. These examples can be used as training instances for supervised WSD algorithms.Peer ReviewedPostprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Towards robustness in natural language understanding

Author: ATSERIAS BATALLA Jordi
Publication venue
Publication date: 01/01/2006
Field of study

Euskal Doktorego Tesien Bilduma - Repositorio de Tesis Doctorales

Secretaría de Estado de Cultura

TXALA un analizador libre de dependencias para el castellano

Author: Atserias Batalla Jordi
Comelles Pujadas Elisabet
Mayor Martínez Aingeru
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2005
Field of study

Esta demostración presenta la primera versión de Txala, un analizador de dependencias para el castellano desarrollado bajo licencia LGPL. Este analizador se enmarca dentro de la generación de una plataforma de software libre para la traducción. La carencia de este tipo de analizadores sintácticos para el castellano, hace que ésta sea una herramienta necesaria para el progreso del PLN en castellano.In this demo we present the first version of Txala, a dependency parser for Spanish developed under LGPL license. This parser is framed in the development of a free-software platform for Machine Translation. Due to the lack of this kind of syntactic parsers for Spanish, this tool is essential for the development of NLP in Spanish.Esta investigación ha sido parcialmente financiada por el Ministerio de Industria, Turismo y Comercio PROFIT FIT-340101-2004-3

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Starting up the Multilingual Central Repository

Author: Atserias Batalla Jordi
Rigau Claramunt German
Villarejo Muñoz Luis
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2003
Field of study

Este artículo presenta el diseño inicial del "Multilingual Central Repository". La primera versión del MCR integra, siguiendo en el marco de EuroWordNet, cinco wordnets locales (incluyendo tres versiones del WordNet de Princeton), la Top Ontology the EuroWordNet, los dominios de MultiWordNet y cientos de miles de nuevas relaciones semánticas y propiedades adquiridas automáticamente de corpus. De hecho, el MCR resultante constituye la más grande y rica base de conocimiento multilingüe nunca construida.This paper describes the initial design of the Multilingual Central Repository. The first version of the MCR integrates into the same EuroWordNet framework, five local wordnets (including three versions of the English WordNet from Princeton), the EuroWordNet Top Ontology, MultiWordNet Domains, and hundreds of thousand of new semantic relations and properties automatically acquired from corpora. In fact, the resulting MCR is going to constitute the largest and richest multilingual lexical-knowledge ever build.This research has been partially funded by the Spanish Research Department (HERMES TIC2000-0335-C03-02) and by the European Commission (MEANING IST-2001-34460)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Secretaría de Estado de Cultura